Overview

Dataset statistics

Number of variables11
Number of observations53940
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.5 MiB
Average record size in memory88.0 B

Variable types

NUM8
CAT3

Warnings

price is highly correlated with caratHigh correlation
carat is highly correlated with price and 3 other fieldsHigh correlation
x is highly correlated with carat and 2 other fieldsHigh correlation
y is highly correlated with carat and 2 other fieldsHigh correlation
z is highly correlated with carat and 2 other fieldsHigh correlation
df_index has unique values Unique

Reproduction

Analysis started2020-11-10 06:10:38.259793
Analysis finished2020-11-10 06:11:01.705371
Duration23.45 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct53940
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26970.5
Minimum1
Maximum53940
Zeros0
Zeros (%)0.0%
Memory size421.4 KiB
2020-11-10T03:11:01.847403image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2697.95
Q113485.75
median26970.5
Q340455.25
95-th percentile51243.05
Maximum53940
Range53939
Interquartile range (IQR)26969.5

Descriptive statistics

Standard deviation15571.2811
Coefficient of variation (CV)0.5773449175
Kurtosis-1.2
Mean26970.5
Median Absolute Deviation (MAD)13485
Skewness0
Sum1454788770
Variance242464795
MonotocityStrictly increasing
2020-11-10T03:11:02.060451image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
67421< 0.1%
 
129471< 0.1%
 
149941< 0.1%
 
88491< 0.1%
 
108961< 0.1%
 
539031< 0.1%
 
498051< 0.1%
 
518521< 0.1%
 
375111< 0.1%
 
Other values (53930)53930> 99.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
539401< 0.1%
 
539391< 0.1%
 
539381< 0.1%
 
539371< 0.1%
 
539361< 0.1%
 

carat
Real number (ℝ≥0)

HIGH CORRELATION

Distinct273
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7979397479
Minimum0.2
Maximum5.01
Zeros0
Zeros (%)0.0%
Memory size421.4 KiB
2020-11-10T03:11:02.263498image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile0.3
Q10.4
median0.7
Q31.04
95-th percentile1.7
Maximum5.01
Range4.81
Interquartile range (IQR)0.64

Descriptive statistics

Standard deviation0.4740112444
Coefficient of variation (CV)0.5940439058
Kurtosis1.256635333
Mean0.7979397479
Median Absolute Deviation (MAD)0.32
Skewness1.116645921
Sum43040.87
Variance0.2246866598
MonotocityNot monotonic
2020-11-10T03:11:02.455542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.326044.8%
 
0.3122494.2%
 
1.0122424.2%
 
0.719813.7%
 
0.3218403.4%
 
115582.9%
 
0.914852.8%
 
0.4113822.6%
 
0.412992.4%
 
0.7112942.4%
 
Other values (263)3600666.8%
 
ValueCountFrequency (%) 
0.212< 0.1%
 
0.219< 0.1%
 
0.225< 0.1%
 
0.232930.5%
 
0.242540.5%
 
ValueCountFrequency (%) 
5.011< 0.1%
 
4.51< 0.1%
 
4.131< 0.1%
 
4.012< 0.1%
 
41< 0.1%
 

cut
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size421.4 KiB
Ideal
21551 
Premium
13791 
Very Good
12082 
Good
4906 
Fair
 
1610
ValueCountFrequency (%) 
Ideal2155140.0%
 
Premium1379125.6%
 
Very Good1208222.4%
 
Good49069.1%
 
Fair16103.0%
 
2020-11-10T03:11:02.713599image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-10T03:11:02.868634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:11:03.084684image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length5
Mean length6.286503522
Min length4

color
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size421.4 KiB
G
11292 
E
9797 
F
9542 
H
8304 
D
6775 
Other values (2)
8230 
ValueCountFrequency (%) 
G1129220.9%
 
E979718.2%
 
F954217.7%
 
H830415.4%
 
D677512.6%
 
I542210.1%
 
J28085.2%
 
2020-11-10T03:11:03.267728image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-10T03:11:03.390755image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:11:04.006892image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

clarity
Categorical

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size421.4 KiB
SI1
13065 
VS2
12258 
SI2
9194 
VS1
8171 
VVS2
5066 
Other values (3)
6186 
ValueCountFrequency (%) 
SI11306524.2%
 
VS21225822.7%
 
SI2919417.0%
 
VS1817115.1%
 
VVS250669.4%
 
VVS136556.8%
 
IF17903.3%
 
I17411.4%
 
2020-11-10T03:11:04.176933image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-10T03:11:04.308960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:11:04.639041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length4
Median length3
Mean length3.114757138
Min length2

depth
Real number (ℝ≥0)

Distinct184
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean61.74940489
Minimum43
Maximum79
Zeros0
Zeros (%)0.0%
Memory size421.4 KiB
2020-11-10T03:11:04.871090image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum43
5-th percentile59.3
Q161
median61.8
Q362.5
95-th percentile63.8
Maximum79
Range36
Interquartile range (IQR)1.5

Descriptive statistics

Standard deviation1.432621319
Coefficient of variation (CV)0.02320056884
Kurtosis5.739414582
Mean61.74940489
Median Absolute Deviation (MAD)0.7
Skewness-0.0822940263
Sum3330762.9
Variance2.052403843
MonotocityNot monotonic
2020-11-10T03:11:05.088136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
6222394.2%
 
61.921634.0%
 
61.820773.9%
 
62.220393.8%
 
62.120203.7%
 
61.619563.6%
 
62.319403.6%
 
61.719043.5%
 
62.417923.3%
 
61.517193.2%
 
Other values (174)3409163.2%
 
ValueCountFrequency (%) 
432< 0.1%
 
441< 0.1%
 
50.81< 0.1%
 
511< 0.1%
 
52.21< 0.1%
 
ValueCountFrequency (%) 
792< 0.1%
 
78.21< 0.1%
 
73.61< 0.1%
 
72.91< 0.1%
 
72.21< 0.1%
 

table
Real number (ℝ≥0)

Distinct127
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57.45718391
Minimum43
Maximum95
Zeros0
Zeros (%)0.0%
Memory size421.4 KiB
2020-11-10T03:11:05.292184image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum43
5-th percentile54
Q156
median57
Q359
95-th percentile61
Maximum95
Range52
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.234490563
Coefficient of variation (CV)0.03888966376
Kurtosis2.80185686
Mean57.45718391
Median Absolute Deviation (MAD)1
Skewness0.7968958487
Sum3099240.5
Variance4.992948075
MonotocityNot monotonic
2020-11-10T03:11:05.506233image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
56988118.3%
 
57972418.0%
 
58836915.5%
 
59657212.2%
 
55626811.6%
 
6042417.9%
 
5425944.8%
 
6122824.2%
 
6212732.4%
 
635881.1%
 
Other values (117)21484.0%
 
ValueCountFrequency (%) 
431< 0.1%
 
441< 0.1%
 
492< 0.1%
 
502< 0.1%
 
50.11< 0.1%
 
ValueCountFrequency (%) 
951< 0.1%
 
791< 0.1%
 
761< 0.1%
 
734< 0.1%
 
711< 0.1%
 

price
Real number (ℝ≥0)

HIGH CORRELATION

Distinct11602
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3932.799722
Minimum326
Maximum18823
Zeros0
Zeros (%)0.0%
Memory size421.4 KiB
2020-11-10T03:11:05.732282image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum326
5-th percentile544
Q1950
median2401
Q35324.25
95-th percentile13107.1
Maximum18823
Range18497
Interquartile range (IQR)4374.25

Descriptive statistics

Standard deviation3989.439738
Coefficient of variation (CV)1.014401958
Kurtosis2.177695759
Mean3932.799722
Median Absolute Deviation (MAD)1670
Skewness1.618395283
Sum212135217
Variance15915629.42
MonotocityNot monotonic
2020-11-10T03:11:05.948338image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
6051320.2%
 
8021270.2%
 
6251260.2%
 
8281250.2%
 
7761240.2%
 
7891210.2%
 
6981210.2%
 
5441200.2%
 
6661140.2%
 
5521130.2%
 
Other values (11592)5271797.7%
 
ValueCountFrequency (%) 
3262< 0.1%
 
3271< 0.1%
 
3341< 0.1%
 
3351< 0.1%
 
3362< 0.1%
 
ValueCountFrequency (%) 
188231< 0.1%
 
188181< 0.1%
 
188061< 0.1%
 
188041< 0.1%
 
188031< 0.1%
 

x
Real number (ℝ≥0)

HIGH CORRELATION

Distinct554
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.731157212
Minimum0
Maximum10.74
Zeros8
Zeros (%)< 0.1%
Memory size421.4 KiB
2020-11-10T03:11:06.172381image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4.29
Q14.71
median5.7
Q36.54
95-th percentile7.66
Maximum10.74
Range10.74
Interquartile range (IQR)1.83

Descriptive statistics

Standard deviation1.121760747
Coefficient of variation (CV)0.1957302348
Kurtosis-0.6181606709
Mean5.731157212
Median Absolute Deviation (MAD)0.93
Skewness0.3786763426
Sum309138.62
Variance1.258347173
MonotocityNot monotonic
2020-11-10T03:11:06.373429image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
4.374480.8%
 
4.344370.8%
 
4.334290.8%
 
4.384280.8%
 
4.324250.8%
 
4.354070.8%
 
4.393880.7%
 
4.313870.7%
 
4.363860.7%
 
4.43730.7%
 
Other values (544)4983292.4%
 
ValueCountFrequency (%) 
08< 0.1%
 
3.732< 0.1%
 
3.741< 0.1%
 
3.761< 0.1%
 
3.771< 0.1%
 
ValueCountFrequency (%) 
10.741< 0.1%
 
10.231< 0.1%
 
10.141< 0.1%
 
10.021< 0.1%
 
10.011< 0.1%
 

y
Real number (ℝ≥0)

HIGH CORRELATION

Distinct552
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.734525955
Minimum0
Maximum58.9
Zeros7
Zeros (%)< 0.1%
Memory size421.4 KiB
2020-11-10T03:11:06.593477image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4.3
Q14.72
median5.71
Q36.54
95-th percentile7.65
Maximum58.9
Range58.9
Interquartile range (IQR)1.82

Descriptive statistics

Standard deviation1.142134674
Coefficient of variation (CV)0.1991681062
Kurtosis91.21455716
Mean5.734525955
Median Absolute Deviation (MAD)0.92
Skewness2.434166716
Sum309320.33
Variance1.304471614
MonotocityNot monotonic
2020-11-10T03:11:06.798526image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
4.344370.8%
 
4.374350.8%
 
4.354250.8%
 
4.334210.8%
 
4.324140.8%
 
4.394070.8%
 
4.384060.8%
 
4.43870.7%
 
4.313860.7%
 
4.413840.7%
 
Other values (542)4983892.4%
 
ValueCountFrequency (%) 
07< 0.1%
 
3.681< 0.1%
 
3.712< 0.1%
 
3.721< 0.1%
 
3.731< 0.1%
 
ValueCountFrequency (%) 
58.91< 0.1%
 
31.81< 0.1%
 
10.541< 0.1%
 
10.161< 0.1%
 
10.11< 0.1%
 

z
Real number (ℝ≥0)

HIGH CORRELATION

Distinct375
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.538733778
Minimum0
Maximum31.8
Zeros20
Zeros (%)< 0.1%
Memory size421.4 KiB
2020-11-10T03:11:07.035580image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.65
Q12.91
median3.53
Q34.04
95-th percentile4.73
Maximum31.8
Range31.8
Interquartile range (IQR)1.13

Descriptive statistics

Standard deviation0.7056988469
Coefficient of variation (CV)0.1994212877
Kurtosis47.08661933
Mean3.538733778
Median Absolute Deviation (MAD)0.57
Skewness1.522422559
Sum190879.3
Variance0.4980108626
MonotocityNot monotonic
2020-11-10T03:11:07.244625image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2.77671.4%
 
2.697481.4%
 
2.717381.4%
 
2.687301.4%
 
2.726971.3%
 
2.676491.2%
 
2.736121.1%
 
2.665551.0%
 
2.745481.0%
 
4.025381.0%
 
Other values (365)4735887.8%
 
ValueCountFrequency (%) 
020< 0.1%
 
1.071< 0.1%
 
1.411< 0.1%
 
1.531< 0.1%
 
2.061< 0.1%
 
ValueCountFrequency (%) 
31.81< 0.1%
 
8.061< 0.1%
 
6.981< 0.1%
 
6.721< 0.1%
 
6.431< 0.1%
 

Interactions

2020-11-10T03:10:45.117347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:45.356403image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:45.558448image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:45.791501image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:45.991545image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:46.212602image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:46.437646image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:46.663701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:46.953763image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:47.179814image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:47.418383image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:47.643104image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:47.855152image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:48.099207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:48.554311image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:48.807370image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:49.031419image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:49.284482image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:49.500524image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:49.795592image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:50.000643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:50.252695image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:50.489750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:50.730803image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:50.975860image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:51.182906image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:51.407957image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:51.623008image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:51.828054image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:52.058327image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:52.296862image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:52.508516image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:52.761570image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:52.974043image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:53.201608image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:53.430442image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:53.692500image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:53.989568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:54.238625image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:54.505746image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:54.739796image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:55.013856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:55.220907image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:55.449959image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:55.653001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:56.202127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:56.438178image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:56.655227image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:56.874278image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:57.099332image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:57.337382image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:57.563436image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:57.785484image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:58.020536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:58.297600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:58.551656image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:58.802714image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:59.057775image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:59.294825image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:59.543882image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:10:59.770933image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:11:00.015989image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:11:00.219035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:11:00.454088image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-11-10T03:11:07.440670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-11-10T03:11:07.741737image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-11-10T03:11:08.042808image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-11-10T03:11:08.351876image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-11-10T03:11:08.652945image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-11-10T03:11:00.884185image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-10T03:11:01.407303image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

df_indexcaratcutcolorclaritydepthtablepricexyz
010.23IdealESI261.555.03263.953.982.43
120.21PremiumESI159.861.03263.893.842.31
230.23GoodEVS156.965.03274.054.072.31
340.29PremiumIVS262.458.03344.204.232.63
450.31GoodJSI263.358.03354.344.352.75
560.24Very GoodJVVS262.857.03363.943.962.48
670.24Very GoodIVVS162.357.03363.953.982.47
780.26Very GoodHSI161.955.03374.074.112.53
890.22FairEVS265.161.03373.873.782.49
9100.23Very GoodHVS159.461.03384.004.052.39

Last rows

df_indexcaratcutcolorclaritydepthtablepricexyz
53930539310.71PremiumESI160.555.027565.795.743.49
53931539320.71PremiumFSI159.862.027565.745.733.43
53932539330.70Very GoodEVS260.559.027575.715.763.47
53933539340.70Very GoodEVS261.259.027575.695.723.49
53934539350.72PremiumDSI162.759.027575.695.733.58
53935539360.72IdealDSI160.857.027575.755.763.50
53936539370.72GoodDSI163.155.027575.695.753.61
53937539380.70Very GoodDSI162.860.027575.665.683.56
53938539390.86PremiumHSI261.058.027576.156.123.74
53939539400.75IdealDSI262.255.027575.835.873.64